About Course
Course Outline
This course provides an introduction to GPU, multi-core and multi-node programming and gives an overview of the importance of large scale parallelism, threading concepts, multithreading methodology and programming with threads (MPI, CUDA, OpenMP, and pThreads). The course will help in the design and implementation of GPU and parallel applications.
Expected Outcomes
After completing this course, a student should be able to:
- Get exposure to CUDA programming on NVIDIA GPU architecture .
- Gain an understanding of how to develop well-optimized threaded applications and to improve HPC application performance on parallel computers (both SMP and distributed parallel machines).
- Exhibit understanding of multi-node computing using MPI.
Topics to Be Covered
- Basic concepts in parallel and GPU programming
- Amdahl's Law.
- Distributed Parallel Programming and Shared Memory Parallel Programming.
- Instruction level parallelism, vectorization, and SSE instructions.
- Processes, threads and Message Passing Interface (MPI).
- Data parallel and task parallel programming.
- Synchronization and mutual exclusion issues.
- Synchronization primitives - mutex, critical sections, semaphores.
- Hazards, data races, deadlocks and subtle bugs in parallel programs.
- Parallel overheads, load balancing and performance tuning.
- Parallelization of a serial applications: partition, communicate, agglomerate.
- Application scalability.
- Basic of CUDA architecture .
- Programming with CUDA C.
- CUDA Optimization I, II, III.
- CUDA Tools - Visual Profiler, Parallel N sight.
- CUDA Libraries.
- Directive based Programming.
- Recent topics - Features of CUDA 4.0, 4.1 Toolkit.
- Multi-GPU Programming, UVA.